70 research outputs found

    Vizuális kategóriák tanulása = Learning visual categories

    Get PDF
    A jelenlegi kutatás elsődleges célja vizuális információt reprezentáló modellek kidolgozása volt integrálva a képi objektumok strukturális és megjelenési jellemzőit. A kutatásunk során az objektumok megjelenésének és struktúrájának egy- és több-nézeti modellezésével, különböző vizuális jellemzők egységes modellbe való integrálásával, statisztikai tanulóalgoritmusok alkalmazásával valamint objektumok kategorizálásával foglalkoztunk. A kidolgozott kategorizáló eljárásokat járműtípusok felismerésére valamint arcképek nemek és érzelmek alapján történő osztályozására alkalmaztuk. Az elért eredmények alapján kijelenthető, hogy ezen jellemzők integrálásával jelentősen javítható a klasszikus képi kategerizáló és felismerő algoritmusok hatékonysága. | The primary goal of present work was to develop methods for the representation of visual information that integrates appearance and structure visual cues. During our research we dealt with modelling objects' appearance and structure from single and multiple views, integrating different visual cues into single models, applying statistical learning algorithms and with object categorization. The developed methods were applied to categorization of cars by type, faces by gender and emotion. The obtained results demonstrate that this kind of integration of visual cues increases the performance of classic visual information categorization and recognition methods

    Video camera registration using accumulated co-motion maps

    Get PDF
    The paper presents a method to register partially overlapping camera-views of scenes where the objects of interest are in motion even if unstructured environment and motion. In a typical outdoor multi-camera system the observed objects might be very different due to the changes in lighting conditions and different camera positions. Hence, static features such as color, shape, and contours cannot be used for camera registration in these cases. Calculation of co-motion statistics, which is followed by outlier rejection and a nonlinear optimization, does the matching. The described robust algorithm finds point correspondences in two camera views (images) without searching for any objects and without tracking any continuous motion. Real-life outdoor experiments demonstrate the feasibility of our approac

    Towards Contrastive Learning in Music Video Domain

    Full text link
    Contrastive learning is a powerful way of learning multimodal representations across various domains such as image-caption retrieval and audio-visual representation learning. In this work, we investigate if these findings generalize to the domain of music videos. Specifically, we create a dual en-coder for the audio and video modalities and train it using a bidirectional contrastive loss. For the experiments, we use an industry dataset containing 550 000 music videos as well as the public Million Song Dataset, and evaluate the quality of learned representations on the downstream tasks of music tagging and genre classification. Our results indicate that pre-trained networks without contrastive fine-tuning outperform our contrastive learning approach when evaluated on both tasks. To gain a better understanding of the reasons contrastive learning was not successful for music videos, we perform a qualitative analysis of the learned representations, revealing why contrastive learning might have difficulties uniting embeddings from two modalities. Based on these findings, we outline possible directions for future work. To facilitate the reproducibility of our results, we share our code and the pre-trained model.Comment: 6 pages, 2 figures, 2 table

    Higher order symmetry for non-linear classification of human walk detection

    Get PDF
    The paper focuses on motion-based information extraction from cluttered video image-sequences. A novel method is introduced which can reliably detect walking human figures contained in such images. The method works with spatio-temporal input information to detect and classify patterns typical of human movement. Our algorithm consists of real-time operations, which is an important factor in practical applications. The paper presents a new information-extraction and temporal-tracking method based on a simplified version of the symmetry pattern extraction, which pattern is characteristic for the moving legs of a walking person. These spatio-temporal traces are labelled by kernel Fisher discriminant analysis. With the use of temporal tracking and non-linear classification we have achieved pedestrian detection from cluttered image scenes with a correct classification rate of 97.6% from 1-2 step periods. The detection rates of linear classifier and SVM are also presented in the results hereby the necessity of a nonlinear method and the power of KFDA for this detection task is also demonstrated

    Behavior and event detection for annotation and surveillance

    Get PDF
    Visual surveillance and activity analysis is an active research field of computer vision. As a result, there are several different algorithms produced for this purpose. To obtain more robust systems it is desirable to integrate the different algorithms. To achieve this goal, the paper presents results in automatic event detection in surveillance videos, and a distributed application framework for supporting these methods. Results in motion analysis for static and moving cameras, automatic fight detection, shadow segmentation, discovery of unusual motion patterns, indexing and retrieval will be presented. These applications perform real time, and are suitable for real life applications
    corecore